Minimizing Single-usage Cache Pollution for Effective Cache Hierarchy Management Thomas Piquet, Olivier Rochecouste, André Seznec
نویسندگان
چکیده
Efficient cache hierarchy management is of a paramount importance when designing high performance processors. Upon a miss, the conventional operation mode of a cache hierarchy is to retrieve back the missing block from higher levels and to store the block into all hierarchy levels. It is however difficult to assert that storing the block into intermediate levels will be really useful. In the literature, this phenomenon, referred to as cache pollution, is often associated with prefetching techniques, that is, a prefetched block could evict data that is more likely to be reused in a near future. Cache pollution could cause severe performance degradation. This paper is typically concerned with addressing this phenomenon in the highest level of cache hierarchy. Unlike past studies that treat polluting cache blocks as blocks that are never accessed (i.e. only due to prefetching), our proposal rather attempts to eliminate cache pollution that is inherent to the application. Our observations did indeed reveal that cache blocks that are only accessed once single-usage blocks are quite significant at runtime and especially in the highest level of cache hierarchy. In addition, most single-usage cache blocks are data that can be prefetched. We show that employing a simple prediction mechanism is sufficient to uncover most of the single-usage blocks. For a two-level cache hierarchy, these blocks are directly sent from main memory to L1 cache. Performing data bypassing on L2 cache maximizes memory hierarchy and allows hard-toprefetch memory references to remain into this cache hierarchy level. Our experimental results show that minimizing single-usage cache pollution in the L2 cache leads to a significant decrease in its miss rate; resulting therefore in noticeable performance gains. Key-words: Computer Architecture, Memory Hierarchy, Cache Pollution, Single-Usage Data, BlockUsage Prediction, Hardware Prefetching.
منابع مشابه
Exploiting Single-Usage for Effective Memory Management
Efficient memory management is crucial when designing high performance processors. Upon a miss, the conventional operation mode of a cache hierarchy is to retrieve the missing block from lower levels and to store it into all hierarchy levels. It is however difficult to assert that storing the block into intermediate levels will be really useful. In particular, this is unnecessary if a cache blo...
متن کاملZero-Content Augmented Cache
It has been observed that some applications manipulate large amounts of null data. Moreover these zero data often exhibit high spatial locality. On some applications more than 20% of the data accesses concern null data blocks. Representing a null block in a cache on a standard cache line appears as a waste of resources. In this paper, we propose the Zero-Content Augmented cache, the ZCA cache. ...
متن کاملSoftware Assistance for Data Caches
Hardware and software cache optimizations are active elds of research, that have yielded powerful but occasionally complex designs and algorithms. The purpose of this paper is to investigate the performance of combined though simple software and hardware optimizations. Because current caches provide little exibility for exploiting temporal and spatial locality, two hardware modiications are pro...
متن کاملCommunication - Minimizing Algorithms for Matrix Multiplication
As computers increase in speed, the proportion of time spent on communication between cache and hard drive or between multiple processors continues to rise. For single processors, data must be moved between the processor’s fast-access cache and main memory, an operation that often takes many orders of magnitude longer than any arithmetic operation. When multiple levels of cache are present, a c...
متن کاملData Cache Performance When Vector-Like Accesses Bypass the Cache
A Stream Memory Controller, when added to a conventional memory hierarchy, routes vector-like accesses around the data cache. A memory system was simulated under these conditions and the data cache performance increased dramatically. The gain in performance was a result of the increased temporal locality of the access pattern. The access pattern also showed a decrease in spatial locality, makin...
متن کامل